O 
O 


I 

CZ3 


I 

o 
o 


in 
o 

O 


o 
o 


X 


Rigorous Bounds to Retarded Learning 

Using an elegant approach, Herschkowitz and Op- 
per iQl established rigourous bounds on the information 
inferable (learnable) from a set of data (m points G 5R^) 
when the latter are drawn from a distribution P(x) = 
Po(x) exp[— y(A)], where Po{x.) is a spherical normal 
law and exp[— V^(A)] is a modulation along an unknown 
anisotropy axis A = w • x, for some direction w. They 
show in particular that if P(x) has zero mean, it is impos- 
sible to learn the direction of anisotropy below a critical 
fraction of data a*, and claim that a* — aib = (1 — A^)~^ 
only depends on A^, the second moment of the distribu- 
tion along A, P(A) = e-^^^^ exp[-V{X)]/V2^. 

The authors reach this conclusion by an expansion at 
small q of the upper bound to AR, the difference between 
the trivial risk and the cumulative Bayes risk. In the 
thermodynamic limit (m — )■ oo, — > cx3 with a = ni/N 
finite) the upper boundQ is given by max^ Gq, (g) — 


G^iq*), with G„(q) = ln(l~Q' 
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[-V(x)-V(xq+y^l-q'^)] 
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and Dx = ' dx/ \/2ti. The authors show that q — Q 
is always a maximum for a < an,. In the case of highly 
anisotropic data distributions, when the learning task is 
simple enough that only the variance matters, this leads 
to a* — aib- However, they disregarded the possibility 
of having other extrema, which we expect to exist [^|J^] 
if there is some structure in the data along A. We show 
here that the global maximum may jump from g* = to 
a finite value q* = qi, at a* = ai < an,. Consider data 
whith components along A drawn according to 
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A straightforward calculation shows that Ga{q) has in- 
deed a maximum at qi > 0, which may overcome the one 
at g = for some values of p and a. This first order phase 
transition of the upper bound signals the onset of a phase 
where learning is possible at ai < a^- On the Figure we 
represent ai and aib as a function of p, for a = 0.5. 
It may be seen that ai{p) leaves aib{p) with a discon- 
tinuous slope at p = 0.7023(7) (ai = a^ = 15.177(9), 
qi = 0.900(5)) (the inset shows the two maxima of G{q)), 
but smoothly at p = 1.338(1) {qi = 0, ai = aw = 0.96). 
In the latter case, both the second and fourth order co- 
efficients of the q expansion of Gq (g) vanish at the tran- 
sition. The other inset represents Ga{q) for p = 1; the 
transition occurs at ai = 4.477(5) < aib = 16, at which 
q* jumps from to qi ~ 0.876(4). 


In some limiting cases, the first order transition may 
occur with a jump of q* from directly to g* = 1 at 
a* ~ ai = \. This arises, for example, for P(A) = 

f Er=±i '5(A-Tp) + lf^ Er=±i '5(A-V), when a = 0.2, 
p^O.5, p' = 1.4. 

One of the main conclusions of rcf. ||l|, based on this 
upper bound, is that retarded learning exists whenever 
A = 0. This conclusion is not invalidated by the present 
analysis: although there is no simple and general expres- 
sion for a*, it can be shown that < a* < aib- 



FIG. 1. Lower bound to the fraction of examples a below 
which learning is impossible, as a function of p for o — 0.5. 
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*minq erroneously stands for max, in eq.(7) of Q 
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